AITopics | multimodal ai model

Collaborating Authors

multimodal ai model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

Wang, Zhanliang, Wang, Kai

arXiv.org Artificial IntelligenceAug-4-2025

Multimodal AI models have achieved impressive performance in tasks that require integrating information from multiple modalities, such as vision and language. However, their "black-box" nature poses a major barrier to deployment in high-stakes applications where interpretability and trustworthiness are essential. How to explain cross-modal interactions in multimodal AI models remains a major challenge. While existing model explanation methods, such as attention map and Grad-CAM, offer coarse insights into cross-modal relationships, they cannot precisely quantify the synergistic effects between modalities, and are limited to open-source models with accessible internal weights. Here we introduce MultiSHAP, a model-agnostic interpretability framework that leverages the Shapley Interaction Index to attribute multimodal predictions to pairwise interactions between fine-grained visual and textual elements (such as image patches and text tokens), while being applicable to both open- and closed-source models. Our approach provides: (1) instance-level explanations that reveal synergistic and suppressive cross-modal effects for individual samples - "why the model makes a specific prediction on this input", and (2) dataset-level explanation that uncovers generalizable interaction patterns across samples - "how the model integrates information across modalities". Experiments on public multimodal benchmarks confirm that MultiSHAP faithfully captures cross-modal reasoning mechanisms, while real-world case studies demonstrate its practical utility. Our framework is extensible beyond two modalities, offering a general solution for interpreting complex multimodal AI models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.00576

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Multimodal MRI-Ultrasound AI for Prostate Cancer Detection Outperforms Radiologist MRI Interpretation: A Multi-Center Study

Jahanandish, Hassan, Sang, Shengtian, Li, Cynthia Xinran, Vesal, Sulaiman, Bhattacharya, Indrani, Lee, Jeong Hoon, Fan, Richard, Sonna, Geoffrey A., Rusu, Mirabela

arXiv.org Artificial IntelligenceJan-31-2025

Pre-biopsy magnetic resonance imaging (MRI) is increasingly used to target suspicious prostate lesions. This has led to artificial intelligence (AI) applications improving MRI-based detection of clinically significant prostate cancer (CsPCa). However, MRI-detected lesions must still be mapped to transrectal ultrasound (TRUS) images during biopsy, which results in missing CsPCa. This study systematically evaluates a multimodal AI framework integrating MRI and TRUS image sequences to enhance CsPCa identification. The study included 3110 patients from three cohorts across two institutions who underwent prostate biopsy. The proposed framework, based on the 3D UNet architecture, was evaluated on 1700 test cases, comparing performance to unimodal AI models that use either MRI or TRUS alone. Additionally, the proposed model was compared to radiologists in a cohort of 110 patients. The multimodal AI approach achieved superior sensitivity (80%) and Lesion Dice (42%) compared to unimodal MRI (73%, 30%) and TRUS models (49%, 27%). Compared to radiologists, the multimodal model showed higher specificity (88% vs. 78%) and Lesion Dice (38% vs. 33%), with equivalent sensitivity (79%). Our findings demonstrate the potential of multimodal AI to improve CsPCa lesion targeting during biopsy and treatment planning, surpassing current unimodal models and radiologists; ultimately improving outcomes for prostate cancer patients.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.00146

Country:

North America > United States > California (0.05)
North America > United States > New Hampshire (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications

Schouten, Daan, Nicoletti, Giulia, Dille, Bas, Chia, Catherine, Vendittelli, Pierpaolo, Schuurmans, Megan, Litjens, Geert, Khalili, Nadieh

arXiv.org Artificial IntelligenceNov-6-2024

Recent technological advances in healthcare have led to unprecedented growth in patient data quantity and diversity. While artificial intelligence (AI) models have shown promising results in analyzing individual data modalities, there is increasing recognition that models integrating multiple complementary data sources, so-called multimodal AI, could enhance clinical decision-making. This scoping review examines the landscape of deep learning-based multimodal AI applications across the medical domain, analyzing 432 papers published between 2018 and 2024. We provide an extensive overview of multimodal AI development across different medical disciplines, examining various architectural approaches, fusion strategies, and common application areas. Our analysis reveals that multimodal AI models consistently outperform their unimodal counterparts, with an average improvement of 6.2 percentage points in AUC. However, several challenges persist, including cross-departmental coordination, heterogeneous data characteristics, and incomplete datasets. We critically assess the technical and practical challenges in developing multimodal AI systems and discuss potential strategies for their clinical implementation, including a brief overview of commercially available multimodal AI models for clinical decision-making. Additionally, we identify key factors driving multimodal AI development and propose recommendations to accelerate the field's maturation. This review provides researchers and clinicians with a thorough understanding of the current state, challenges, and future directions of multimodal AI in medicine.

dataset, modality, prediction, (15 more...)

arXiv.org Artificial Intelligence

2411.03782

Country:

North America > United States (0.28)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Netherlands > South Holland > Rotterdam (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(9 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Meta will reportedly withhold multimodal AI models from the EU amid regulatory uncertainty

EngadgetJul-17-2024, 21:55:43 GMT

Meta has decided to not offer its upcoming multimodal AI model and future versions to customers in the European Union citing a lack of clarity from European regulators, according to a report by Axios. The models in question are designed to process not only text but also images and audio, and power AI capabilities in Meta platforms as well as the company's Ray-Ban smart glasses. "We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment," Meta said in a statement to Axios. Meta's move follows a similar decision by Apple, which recently announced it would not release its Apple Intelligence features in Europe due to regulatory concerns. Margrethe Vesteger, the EU's competition commissioner, had slammed Apple's move, saying that the company's decision was a "stunning, open declaration that they know 100 percent that this is another way of disabling competition where they have a stronghold already."

ai model, meta, multimodal ai model, (8 more...)

Engadget

Country: Europe > United Kingdom (0.06)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.88)
Government > Regional Government > Europe Government (0.60)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.82)

Add feedback

Official: GPT-4 is already here - Technology Org

#artificialintelligenceMar-15-2023, 00:05:22 GMT

OpenAI has just announced GPT-4, its next-generation large multimodal AI model based on natural-language processing. The previous GPT-3 forms the basis of the famous ChatGPT artificial intelligence chatbot, capable of generating human-like responses to queries provided by its users. GPT-4 is said to be capable of accepting text and image inputs. This allows users to specify both vision and language tasks. For example, you can now use documents with text plus photographs, diagrams, or even screenshots – the generated output uses all initial data to generate answers based on all available data (try submitting a funny image and asking AI to tell what makes this image amusing).

exhibit human-level performance, gpt-4, openai, (6 more...)

#artificialintelligence

Country: Asia > India (0.07)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback